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Abstract. In some applications of matching, the structural or hierar- 
chical properties of the two graphs being aligned must be maintained. 
The hierarchical properties are induced by the direction of the edges in 
the two directed graphs. These structural relationships defined by the 
hierarchy in the graphs act as a constraint on the alignment. In this pa- 
per, we formalize the above problem as the weighted alignment between 
two directed acyclic graphs. We prove that this problem is NP-complete, 
show several upper bounds for approximating the solution, and finally 
introduce polynomial time algorithms for sub-classes of directed acyclic 
graphs. 

1 The problem 

Matching or alignment problems arc an important set of theoretical problems 
that appear in many different applications |3I4I9| . Depending on the structure 
of the problem, polynomial time algorithms may or may not exist. In this paper, 
we propose a new type matching problem called the weighted hierarchical DAG 
(directed acyclic graph) alignment problem. In this problem, we have two directed 
acyclic graphs and a set of possible matchings between vertices in both graphs. 
We wish to find the maximum weighted matching between the vertices where the 
directed edges in both graphs act as hierarchical constraints on possible solutions 
to the matching. For example, if a vertex v\ has a directed edge to a vertex V2, 
then any matched vertex to i>2 cannot be an ancestor of v\ 's matched vertex (see 
Figures Q] and 2). 

We became interested in this problem through our interest in ontology align- 
ment. An ontology is a conceptualization of a domain |12j . This conceptualization 
consists of a set of terms with certain semantics and relationships [24]. Gener- 
ally, the terms are related by is_a relationships. The relationships (edges) and 
terms (vertices) can be represented as a DAG. With ontology alignment, one 
wants to align terms from two different ontologies in order to merge, compare, 
or map the ontologies. Since the edges of the DAG represent an is_a relation- 
ship, then if we apply the strictist sense of this relationship, it constrains the 
number of valid matchings, because we do not wish to violate this relationship 
in the corresponding matching. 



This type of hierarchical or structural constraint is important in other appli- 
cations as well. The domains of SVG (Scalable Vector Graphics) version compari- 
son, source code comparison/merging, UML difference calculation, and file/folder 
merging, are all instances of hierarchical based matching. For example, an SVG 
document is rich with structure. The document defines graphical objects, and 
how they relate, a form of the is_a relationship exists through the document 
graphic layers. In object-oriented programming, is_a relationships exist through 
the definitions of inheritance, and other relationships exist via class membership. 
Similarly, UML diagrams have structural relationships, and different versions of 
diagrams sometimes need to be merged or have their differences calculated for vi- 
sual comparison [20] . Finally, in a file system, the folders represent an embedded 
hierarchy. 




Fig. 1. Example of a valid matching between two graphs. The dashed lines rep- 
resent valid assignments for the vertices Al, A2, A3, and AA. 




Fig. 2. Example of an invalid matching between two graphs. The dashed lines 
represent the assignments for the vertices A1,^42,A3, and AA. The two bold 
dashed lines represent an assignment violation because Al maps to a descendant 
of A2's mapped vertex Bl. 

1.1 Related work 

General graph matching is a well studied problem. Most graph matching prob- 
lems can be divided into two categories, graph isomorphisms and weighted graph 



matching. In graph isomorphism, the goal is to find a matching function / for 
two graphs G\ = (Vi, E\) and G2 = (V2, E2). General graph isomorphism is still 
open, that is, it is not known whether the problem is NP-hard or can be solved 
in polynomial time |10| . Sub-graph isomorphism is known to be NP-complete 
[TT] . With weighted graph matching, we are given a graph G = (V,E), where 
the edges have associated weights and we wish to find a subset M of E, such 
that no two edges in M share a common end vertex and such that the sum of 
edge weights in M is maximum. For some classes of graphs, polynomial time 
algorithms are known, while some others are known to be NP-completc. 

Both of these problems have many practical applications, in particular, graph 
isomorphism has received a lot of attention in the area of computer vision. Im- 
ages or objects can be represented as a graph. A weighted graph can be used 
to formulate a structural description of an object [25] ■ There have been two 
main approaches to solving graph isomorphism: state-space construction with 
searching and nonlinear optimization. The first method consists of building the 
state-space, which can then be searched. This method has an exponential run- 
ning time in the worst case scenario, but by employing heuristics, the search 
can be reduced to a low-order polynomial for many types of graphs |6l26j . With 
the second approach (nonlinear optimization), the most successful approaches 
have been relaxation labeling [16] , neural networks [19] , linear programming [I] , 
eigendecomposition [27] , genetic algorithms [17] , and Lagrangian relaxation [23] . 

Another type of graph problem related to ours is graph alignment through 
minimizing the edit distance [28|5| . In this problem, the graphs are transformed 
via editing (deletion, insertion, relabelling) to achieve alignment. Our work is 
different is several ways. First, we do not allow any of the graph to be edited as 
is typically done in the edit distance problem. Second, in the work discussed in 
[28] , the authors consider only undirected graphs as opposed to DAGs. Finally, 
the authors of [5] deal with unweighted alignment of trees as opposed to weighted 
alignment of DAGs. 

As mentioned, we became interested in DAG alignment problem due to our 
interests in ontology alignment. Ontology alignment has recently received a lot 
of attention. An alignment between two ontologies can be formalized in terms 
of weighted graph matching, with certain constraints on the solution to any 
valid matching. Originally, alignments were performed by hand, and later, sev- 
eral researchers introduced semi-automatic alignment strategies, which make 
suggestions to the user about which terms to align [21122] . Since then, fully 
automatic alignment strategies have been explored. In [7], over twenty differ- 
ent tools/algorithms are discussed. Many of these approaches use heuristics to 
determine term similarities, by first comparing syntactic, semantic, and struc- 
tural similarities, and then compute matches greedily or via some other local 
optimization technique. 

In [5] , graph matching is applied to conceptual system matching for transla- 
tion. The work is very similar to ontology alignment, however, the authors for- 
malize their problem in terms of any conceptual system rather than restricting 
the work specifically to an ontological formalization of a domain. They formal- 



ize conceptual systems as graphs, and introduce algorithms for matching both 
unweighted and weighted versions of these graphs. 

1.2 Organization of the paper 

The remainder of the paper is organized as follows. The next section introduces 
notations and definitions that will be used throughout the paper. The definitions 
include the formal description of the problem. Following this, we show that the 
decision version of the problem is NP-complete via a reduction from 3SAT. Next, 
we prove two theorems, which yield upper bounds on approximating the DAG 
alignment problem. After this, we introduce a polynomial time algorithm for 
trees and discuss its possible modifications. Finally, we present some concluding 
remarks, a short discussion of open problems, and directions for future research. 

2 Notations and definitions 

2.1 Notations 

Before formally defining the DAG alignment problem we must first introduce 
some definitions. A DAG is a directed graph, G = (V, E) that contains no 
oriented cycles, where V is a set of vertices and E is a set of edges. Let anc(v) 
denote the set of ancestors for any v G V, where an ancestor of v is any a G V 
such that there exists a directed path from a to v. Let desc(v) denote the set 
of descendants for any v £ V, where a descendant of v is any d G V such that 
there exists a directed path from v to d. Finally, let child(v) denote the set of 
direct children for any v G V, where a direct child is any d G V such that there 
exists a directed edge from v to d. 

2.2 Description of problem 

In this section we formalize the problem of DAG alignment with hierarchy con- 
straints. Without the hierarchy constraint, the problem reduces to weighted bi- 
partite matching, since the edges that represent vertex relationships would be 
ignored. As was mentioned, in many practical applications these structural re- 
lationships cannot be ignored. Due to these relationships, many solutions that 
would be valid in weighted bipartite matching are invalid. In fact, we can think 
of any edge e as having a set of conflicting edges, where a conflict is any edge 
that would violate a matching solution that contained e. We formalize this in 
the following definition. 

Definition 1. An edge conflict for edge e = (a,b,w e ), w e G [0, 1], is any edge 
d = (f,g,Wd), Wd G [0,1], and d / e, where one of the following conditions 
applies: 

1. a G anc(f) and b £ anc(g). 

2. a £ desc(f) and b G" desc(g). 



3. a = f. 

4. b = g. 



The set conf(e) denotes the set of edges that have edge conflicts with edge 
e. We can now give the formal definition of the DAG alignment problem. 

Definition 2. Given two DAGs, Gy = (Vy,Ey) and G2 = (^2,^2), and a set 
of edges (3 = Vj, wt)} for all Vi G Vy, all Vj G V2 and wt G [0, 1], the DAG 
alignment problem is to find the maximum weight matching, M C [3, such that 
each vertex in M appears only once and for any edge e G M, conf{e) fl M = 0. 
We refer to this constraint on the matching as the hierarchical constraint for the 
remainder of this paper. 

Our definition of the DAG alignment problem uses a complete bipartite graph 
of all possible matchings with the set of edges [3 = {(vi,Vj,w t )} defined for all 
v.i G Vy and all Vj G Vi- This may appear to narrow the set of problems we are 
trying to solve, however, it does not. This is because a solution to the problem 
with an incomplete (some matchings may be inherently prohibitive) matching 
graph can be reduced to the problem with complete bipartite graph through 
the following consideration. Take a DAG alignment problem in which not every 
node of Gy can potentially be mapped to any node of Gi ■ Allow all the remaining 
matchings, but assign zero weights to them. Solve the DAG alignment problem 
with the complete set of possible matchings. Delete all zero weight matchings 
from the solution. The result is a solution for the DAG alignment problem with 
incomplete set of possible matchings. 

3 Intractability 

The DAG alignment problem defined in the previous section is NP-complctc. 
Before showing the proof of this, we begin by first defining the decision version 
of the problem. 

Definition 3. We arc given two DAGs, Gy = (Vy,Ey) and G 2 = (V^,^), and 
a set of edges [3 = {(i>i, Vj, w t )} for all V{ € Vy, Vj G V2 and w t G [0, 1]. Let w(A), 
where A C f3, be the sum of all weights Wt defined over all triples (vi, Vj,wt) G A. 
Is there a matching M C (3 with weight w(M) > X and \M\ < Y such that each 
vertex in M appears only once and for any edge e G M, conf{e) fl M = 0? 

Theorem 1. DAG alignment, as introduced in Definition 3, is NP-complete. 

Proof. It is easy to see that the decision version of DAG alignment is in NP, so 
this will be omitted. 

We show a reduction of 3SAT to the decision version of the DAG alignment 
problem. In 3SAT we have a finite set of variables, X — {xy, X2, ■ ■ ■ , x n } and a 
finite set of clauses C = {cy, C2, . . . , c m }, such that each clause is logic OR of 3 
literals, where the literals over variable x, are x° (:= Xi) and x\ (:= ~xt). The 



problem is to find a truth assignment to variables in X such that the logic AND 
of all clauses in C is satisfied. 

Let <j> = (X, C) be an instance of 3SAT. We can define an instance of the 
DAG alignment problem as follows. We begin by defining the two DAGs used in 
the alignment. First, let us define G\ = (Vi,Ei) where V\ is defined as follows 

V 1= |J (x? 1 ,i)U{xf,i)U(xf 3 ,i), where 

Ci = (Xj , x p2 , xf 3 ) and pl,p2,p3 G {0, 1} and j, k, I < n. 

We define the set of edges E\ by creating directed edges over the vertices of V\ 
as {{x®, i), (xj, t)) for all j < n and i,t < m. 

Now, let us define a second DAG, G 2 = (V 2 ,E 2 ). First, we define V 2 as 

V% = {{yi,zi,y 2 ,z 2 ,...y n ,z n } x {1,2, ...,m}} 
\J{{c 1: c 2 ,...,c m } x {1,2}}. 

Intuition behind this definition is yi corresponds to Xi and Zi corresponds to xl. 

We define E 2 by creating directed edges {{zj,i), (yj,t)), t), (cj, 1)) and 
{{Vji t)i ( c i; 2)) for all j < n and i,t < m. 

We now have two DAGs, G\ and G2. We must define the set /3, which de- 
scribes the possible matches between the two DAGs, and the related weights. 
For every vertex, (Xj,i) or map this vertex to its corresponding vertex 

in y 2 with weight equal to one and add this to (3. That is, (Xj, i) G Vi maps to 
{yj,i) € Vi and (x\, t) G V\ maps to (zk, t) G V2, and so forth. Also, for each ver- 
tex G Vi, create mappings ({x?, i), (cj, 1)) and ((x^, z), (c^, 2)) both with 
weight equal to one and add this to (3. Let the total weight and the total number 
of vertices for the matching be 3m. 

We now show that the DAG alignment problem, as described above, has 
a matching satisfying the hierarchical mapping constraint, if and only if cj) is 
satisfiable. 

(=>) Assume <fi is satisfiable. For each clause Ci, choose a single literal x?. If 
variable Xj G X is true and p = or Xj € A is false and p = 1, then include edge 
((xj,i), (yj,i)) in the matching M. Also, for any clause c t with Xj include edge 
((Xj,t), (cj, 1)) if vertex (c t ,l) is not in the matching, otherwise include edge 
((x}j, t), (ct, 2)). Similarly, if variable Xj <E X is true and p = 1 or Xj £ X is false 
and p = 0, then include i), (zj, i)) in the matching. Also, for any clause 
c t with x®, include edge ((x®,t), (c t , 1)) if vertex (c t , 1) is not in the matching, 
otherwise include edge ((xj,t), (c t ,2)). Thus, M exactly maps all vertices in G\ 
to vertices in G 2 - There are 3m vertices in Vi, so \M\ = 3m. Also, since the 
weight of each edge is one, w(M) = 3m. Finally, since both x® and x] cannot be 
true, both edges ((xj,i),(yj,i)) and ((xj, i), (zj, i)) cannot be in M, therefore 
the hierarchical constraint is satisfied. 

(<=) Let M be a solution to the DAG alignment problem. The truth value of 
any variable Xj is assigned as follows. If, for any clause Ci with literal x°, there 



exists an edge {{Xj, i), (yj, i)) from G\ to G2, then let Xj be true. Similarly, if 
there exists an edge ((Xj,i),(zj,i)) from G\ to G2, then let Xj be false. Since 
in G±, every vertex (x^i) has an edge to every (xj,t), and in G2 every vertex 
(Zj,i) has an edge to every (yj,t), M cannot contain edges ((x°,i), (yj,i)) and 
((xj, i), (zj, i)), otherwise the hierarchical constraint would be violated. Thus, X® 
or x}j is true, but never both. Also, since any false literal in a clause Cj is mapped 
to a vertex (cj, 1) or (cj, 2), at most 2 vertices in any clause can be false. Thus, 
4> is satisfied. 

4 Upper bounds on approximating weighted DAG 
alignment 

Since weighted DAG alignment belongs to the class of NP-complete problems, 
it is unlikely that we will find a polynomial time solution to the problem. Thus, 
we must rely on an approximation scheme for computing alignments. 

In this section, we introduce two polynomial time reductions of the DAG 
alignment problem to other known NP-complete problems and use these to 
provide upper bounds for approximating the weighted DAG alignment problem. 
The quality of the approximation is given as the ratio between the size of the 
maximum weighted DAG alignment and the approximation found. The ratio in 
the worst-case scenario defines the performance guarantee of the algorithm. 

We begin by reducing the DAG alignment problem to Weighted Independent 
Set (WIS). In the Independent Set problem, we are given a graph G = (V,E), 
and we wish to find the largest subset S QV, such that no two vertices in 5* are 
connected by an edge in E. In the weighted version of this problem, each node, 
i'i G V, has an associated weight Wi, and we wish to find the maximum weighted 
independent set. 

Hastad |13j showed that Independent Set is hard to approximate within n 1-£ , 
for e > 0, unless NP-hard problems have randomized polynomial time solutions. 
In Boppana and Halldorsson introduced the Ramsey algorithm for solving 
WIS. The algorithm is an extension of the naive greedy approach, where in the 
greedy approach a vertex v is arbitrarily selected from the graph and added to the 
independent set, all adjacent vertices are removed, and this process is continued 
until all vertices are exhausted. The obvious problem with this solution is that 
the adjacencies are ignored. The first extension to this process is to consider not 
only the vertex v, but also the neighbors of v. The algorithm recurses by first 
considering v as part of the independent set, and then v not in the independent 
set, and selecting the better of the two results. This algorithm performs well 
provided the maximum Clique size is small. Boppana and Halldorsson further 
extended this algorithm by first removing the maximum set of disjoint fc-cliques, 
and then apply the Ramsey algorithm to compute the independent set on this 
modified graph. From this, they were able to prove that the algorithm had a 
performance guarantee of 0{n/ log 2 n), where n is the number of vertices in the 
graph. 



The following shows that any instance of the DAG alignment problem can 
be reduced, in polynomial time, to an instance of WIS. This reduction will 
allow us to use approximation strategies for Independent Set to find approximate 
solutions to the DAG alignment problem. 

Theorem 2. The ontology alignment problem can be approximated within 0(m/log 2 m) 
where m = \/3\. 

Proof. Consider an instance of the DAG alignment problem, defined by graphs 
Gi = (Vi, E\) and G2 = (V%, £2), and the set of edges (3. We define an instance 
of WIS, by constructing a graph G = (V, E) as follows. For each edge e = 
(a, b, w e ) £ /3, construct a corresponding v £ V, and let the weight of vertex v 
be w := w e . Next, let E = {(uj, Vj)\ej £ conf(ei) and e*, ej £ f3}. 

Now, we claim that a solution to WIS, defined over graph G, corresponds to 
a solution to the DAG alignment problem. We construct this solution as follows. 
Let S be our solution to WIS. Then, for each Vi £ S, add the edge from (3 that 
corresponds to Vi, to our DAG alignment solution M. This precisely constructs 
a valid DAG alignment, since each Vi £ S cannot be connected to any other 
Vj £ S, which implies that for edges ej, ej £ M, conf(ej). Since no edges in 
M conflict, this must be a valid solution. 

WIS can be approximated within 0(n/log 2 n), where n is the number of 
vertices in the graph. In our reduction, n corresponds to \/3\, by letting m = \/3\, 
we achieve an approximation of 0{m/log 2 m). 

Next, we improve this bound via a reduction to the Weighted Set Packing 
(WSP) problem. In WSP, we have a set S of m base elements, and a collection 
U = {Ui, U2, . . . , U n } of weighted subsets of S. We want to find a subcollection 
W C U of disjoint sets of maximum total weight. 

In [15j . an approximation guarantee of y/m, where m = \S\ is given for 
WSP. The algorithm is based on a variant of the greedy algorithm for solving 
the non- weighted version introduced in [14j . In the following theorem, we show 
that any instance of the DAG alignment problem can be reduced to WSP in 
polynomial time, and that a solution to WSP corresponds to a solution of the 
DAG alignment problem. 

Theorem 3. The DAG alignment problem can be approximated within y/m 
where m = |/3| . 

Proof. Consider an instance of the DAG alignment problem, defined by graphs 
G\ = (Vi,Ei) and Gi = (V2, E 2 ), and the set of edges (3. We define an instance 
of WSP, by constructing S and the collection U as follows. 

We let our m base elements be the edges specified by j3, thus our set S = (3. 
We construct the collection U, by defining subsets f/, for all e, £ (3 as Ui — 
{{ e i} U con f( e i)}- Let the weight of Ui be equal to w ei . We now claim that any 
solution to WSP, W , corresponds to a solution the DAG alignment problem. 

We can see this by considering any W . We construct a solution to the DAG 
alignment problem by taking each Ui £ W, and adding edge £ (3 to our 



ontology alignment solution M. This is a valid matching because every Ui £ W 
is disjoint, which implies that for each ei £ M and e,- £ M, conf(ej), so no 
edges in A/ conflict. 

Since a solution to WSP yields a solution to the DAG alignment problem, 
approximations of WSP correspond to approximations of the DAG alignment 
problem. Hence, we can approximate the DAG alignment problem within s/rn, 
where m = 

5 Polynomial-time algorithms 

In this section we study certain types/classes of graphs with respect to their DAG 
alignment problem solution complexity. In particular, we show that the DAG 
alignment problem for trees has a polynomial time solution. In this work, we 
naturally define trees to be those directed trees with all edges directed away from 
a particular vertex called the root. In this section we show that any two such trees 
can be aligned in polynomial time. Furthermore, a chain C n is defined as a DAG 
with n vertices Vi, 1)2, ■ v n and directed edges {1)1,1)2), {1)2,1)3),... , {v n -i,v n ). 

Theorem 4. Any two trees can be aligned in polynomial time. 

Proof. We first describe the data structure used in our algorithm, and then 
explain how it can be used to achieve a polynomial time algorithm that aligns 
two trees. Our algorithm is a form of bottom-up approach that applies weighted 
bipartite matching at each of n x k iterations it makes. 

Suppose we have two trees T 1 = (V 1 ^ 1 ) and T 2 = (V 2 ,E 2 ) with n and 
k vertices correspondingly that need to be aligned. Create an array with n x k 
empty cells C{i,j) {i = l..n,j = l..k) that contain real numbers whose values 
will be assigned during the algorithm and will hold values for best alignment 
of the subtree of T 1 with root in vj with the subtree of T 2 with root in v 2 . 
This array is complemented by an equal size array M{i,j) {i = l..n,j = l..k) 
that contains the actual matchings used for the assigned values of C{i,j). We 
will further describe how to assign values to C{i,j), sometimes omitting the 
discussion of updates to M{i,j). Our algorithm terminates when C{n,k) gets 
assigned a value. Once this is done, the value stored in C{n, k) equals to the 
maximum weight alignment and M{n, k) contains the best matching. 

We next describe the total order on the set of vertices of both trees and 
order cells C{i,j). Consider tree T 1 . Suppose its depth is d. Name all ver- 
tices at level d, v\ through v\ (for instance, name these vertices in the left 
to right order assuming the tree is drawn on paper with no edge intersec- 
tions) for appropriate value of di. Next, name all depth d — 1 vertices, +1 
through v^ 2 , for appropriate value cfe • Continue this operation until all vertices 
are named. Vertex is thus the root of tree T 1 . Apply the same method to 
enumerate vertices in tree T 2 . Cells C{i,j) are ordered lexicographically, e.g. 
C(l,l) -< C(l,2) -< ... -< (7(1, fc) -< (7(2,1) -< (7(2,2) -< ... -< C{n,k). We fill 
values C{i,j) (and keep track of alignment made by updating M{i,j)) in this 
order. 



C(l, 1) is easy to find, because it is equal to the weight of edge e = (v\ 1 v\ 1 w e ) 
of the matching problem. To find the value of C(i,j) (and update M(i,j)) con- 
sider the following cases (with C(i,j) taking the maximal value among those 
found in each of the cases below) : 

1. v\ does not get mapped anywhere. In this case, C(i, j) = max{C(uj, v 2 )\v\ = 
child(vj)}. Each such C{v\,v 2 ) -< C(i,j) and thus the maximum, is well 
defined and can be calculated. The computational cost of this calculation is 
the number of children of vj, i.e. no more than n. 

2. vj is mapped to v\ € desc(v 2 ), and hence k < j. In this case, C(i,j) = 
w ( v ii v k) + where S is the answer to the following weighted bipartite 
matching problem. Assuming v} has children child\, child^, childj c and 
v 2 has children child 2 , child 2 ,, ...,child 2 c , the maximum bipartite matching 
problem whose solution is the number S we are interested in is defined for the 
complete bipartite graph with vertices child\, child\, child] c , child 2 , child 2 ,, 
...,child\ c and edges with weights C(child\, child 2 )\ s —i..i Ct t=i..kc- Note that 
all such weights are known and thus the problem is well defined. The solution 
to the maximum weighted bipartite matching can be found in polynomial 
time, and the number of times we call for a solution is limited by the number 
of descendants of v 2 , which is never more than k (the number of vertices in 
tree T 2 ). Thus, this step can be completed in polynomial time. 

The number of different C(i,j) is polynomial, and the amount of work re- 
quired to fill in each value is polynomial. Thus, our algorithm is polytime. For 
two trees with n vertices each, the complexity of our algorithm is 0(n 6 ): there 
are n 2 numbers C(i,j) to calculate, and calculation of each requires (item 2) 
at most n x n 3 operations assuming the Hungarian algorithm 18J for weighted 
bipartite matching is used. 

It appears that the complexity of the DAG alignment problem moves from P 
to NP-complete in transition from trees to DAGs. The part of the above proof 
that works for trees and breaks for DAGs is the ability to establish an order on 
the numbers C(i,j) such that once a particular C(i,j) has been calculated it 
never needs to get updated. 

The described polynomial time algorithm requires 0(n e ) runtime to align two 
trees. However, for some simpler types of trees the polynomial time complexity 
can be reduced through considering simplified and modified versions of the above 
algorithm. A detailed description of such algorithms is out of scope for this paper. 
However, we would like to mention that two chains (with n vertices each) can 
be aligned with a cost of C>(n 3 ) and two complete binary trees with the cost of 

6 Conclusions 

We introduced a new type of weighted matching problem called the weighted 
hierarchical DAG alignment problem. We formalized this problem, showed that 



it is NP-complete, proved several upper bounds for approximating solutions to 
the problem, and finally introduced algorithms for solving different classes of the 
problem. This problem developed through our research on ontology alignment, 
however, it relates to many different applications, including, but not limited to, 
UML diagram comparison, SVG document comparison, and file/folder mapping. 
Our results show that, in particular, file/folder mapping problem can be solved 
in polynomial time, since the underlying data structure is a tree. 

In the future, we plan to find other classes of DAGs that can be aligned faster 
than with an exponential time algorithm, work on designing efficient heuristics, 
and finally apply some of these ideas to the problem of aligning ontologies. 

With ontologies, the problem becomes even more complex because they can 
contain errors in their specification, meaning that in some circumstances the 
hierarchical constraint must be relaxed. Moreover, this is likely the case with 
other applications of the problem. Thus, it may also be an interesting problem 
to investigate approximate solutions that are allowed to contain a small number 
of edge conflicts, which will accommodate for some human error in an ontology 
specification. 
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